test(perf): Add `Llama-3_1-Nemotron-Ultra-253B-v1` perf tests (cpp) #4446

venkywonka · 2025-05-19T13:35:15Z

Description

Add llama-3.1-nemotron-ultra-253b-v1 perf tests converage.
This only adds cpp backend for fp8.
This is because, since a model this big rarely is used in bf16 precision, only resource-efficient to test on fp8. PyT backend fp8 requires pre-quantized fp8 checkpoints (we currently don't have it added).

Invariants

Setting	Value
GPUs / TP	8 / 8
Engine dtype	bfloat16 → quant-FP8
`max_batch_size`	64
backend	cpp (TRT)
benchmarking backend	`trtllm-bench`

Four sequence profiles were benchmarked:

C1 & C2 – low concurrency: reqs = 8, con = 1
C3 & C4 – high concurrency: reqs = 250, con = 250

Performance Summary

ID	Con.	Input	Output	Req/s	Output TPS (tok/s)	Avg Latency (ms)	TPS/GPU
C1	1	5 000	500	0.074	37.08	13 484	4.64
C2	1	500	2 000	0.020	40.81	49 011	5.10
C3	250	5 000	500	0.453	226.29	388 037	28.29
C4	250	500	2 000	0.387	773.77	433 890	96.72

Latency percentiles

Concurrency = 1 (C1 & C2)

Shape	P50	P90	P95	P99	Min	Max
5 k × 500	13 620	13 751	13 751	13 751	12 798	13 751
500 × 2 k	48 996	49 121	49 121	49 121	48 943	49 121

Concurrency = 250 (C3 & C4)

Shape	P50	P90	P95	P99	Min	Max
5 k × 500	391 470	549 776	550 795	551 404	141 348	551 507
500 × 2 k	375 050	645 705	645 794	645 849	182 317	645 852

Copilot

Pull Request Overview

This PR introduces performance tests for the Llama-3_1-Nemotron-Ultra-253B-v1 model using the cpp TRT backend to ensure that both low- and high-concurrency scenarios pass within CI limits.

Added new test entries with appropriate parameters (max batch size, input/output lengths, concurrency, etc.) in the QA test list.
Updated model mapping in test_perf.py to include the new ultra model for both native and Hugging Face identifiers, and appended a build flag when remote code is trusted.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml	Added new performance test entries for Llama-3_1-Nemotron-Ultra-253B-v1.
tests/integration/defs/perf/test_perf.py	Added new model mapping entries and introduced a build flag for TRUST_REMOTE_CODE_MODELS.

tests/integration/defs/perf/test_perf.py

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull Request Overview

Adds C++-backend FP8 performance tests for the llama-3.1-nemotron-ultra-253b-v1 model and hooks it into the test runner, including enabling remote code trust for quantized builds

New YAML entries in trt_llm_release_perf_test.yml for low/high concurrency FP8 benchmarks
Model mapping definitions added in test_perf.py for both C++ and HF backends
Auto-enables --trust_remote_code for models listed in TRUST_REMOTE_CODE_MODELS during build

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml	Added four new perf test invocations covering C1–C4 scenarios for the new Ultra-253B model
tests/integration/defs/perf/test_perf.py	Registered `llama_v3.1_nemotron_ultra_253b[_hf]` mappings and added `--trust_remote_code` flag

Comments suppressed due to low confidence (3)

tests/integration/defs/perf/test_perf.py:58

The repository path uses 'Llama-3_1...' with an underscore instead of the dot notation ('Llama-3.1...'), which is inconsistent with other model paths and may break resolution. Update to match the existing naming convention.

"nemotron-nas/Llama-3_1-Nemotron-Ultra-253B-v1",

tests/integration/defs/perf/test_perf.py:105

The HuggingFace model path uses an underscore in 'Llama-3_1...' instead of 'Llama-3.1...'; this diverges from established naming and may cause lookup failures. Please correct it.

"nvidia/Llama-3_1-Nemotron-Ultra-253B-v1",

tests/integration/defs/perf/test_perf.py:932

Automatically setting --trust_remote_code=True can pose security risks if unreviewed code is pulled. Ensure these models are audited or document why remote code trust is safe here.

if self._config.model_name in TRUST_REMOTE_CODE_MODELS:

tests/integration/defs/perf/test_perf.py

Signed-off-by: Venky Ganesh <[email protected]>

venkywonka · 2025-05-22T13:39:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-22T13:44:33Z

PR_Github #6157 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-22T17:52:48Z

PR_Github #6157 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4502 completed with status: 'SUCCESS'

…VIDIA#4446) ultra Signed-off-by: Venky Ganesh <[email protected]>

…ests (cpp) (#4446) (#4590) Signed-off-by: Venky Ganesh <[email protected]>

…VIDIA#4446) ultra Signed-off-by: Venky Ganesh <[email protected]> Signed-off-by: darraghdog <[email protected]>

venkywonka requested review from LarryXFly, Copilot, kaiyux, ruodil, schetlur-nv and tijyojwad and removed request for Copilot and ruodil May 19, 2025 13:35

venkywonka marked this pull request as ready for review May 19, 2025 13:35

Copilot AI reviewed May 19, 2025

View reviewed changes

tests/integration/defs/perf/test_perf.py Outdated Show resolved Hide resolved

tests/integration/defs/perf/test_perf.py Outdated Show resolved Hide resolved

venkywonka requested review from LarryXFly, Copilot, kaiyux, ruodil, schetlur-nv and tijyojwad and removed request for LarryXFly, kaiyux, schetlur-nv and tijyojwad May 19, 2025 13:46

Copilot AI reviewed May 19, 2025

View reviewed changes

venkywonka requested a review from Copilot May 19, 2025 14:17

Copilot AI reviewed May 19, 2025

View reviewed changes

tests/integration/defs/perf/test_perf.py Outdated Show resolved Hide resolved

tests/integration/defs/perf/test_perf.py Outdated Show resolved Hide resolved

venkywonka force-pushed the user/venky/ll-nemo-ultra-perf-tests-cpp branch from acbc33c to 02e6b02 Compare May 19, 2025 15:11

tijyojwad approved these changes May 19, 2025

View reviewed changes

tests/integration/defs/perf/test_perf.py Outdated Show resolved Hide resolved

LarryXFly approved these changes May 21, 2025

View reviewed changes

ruodil approved these changes May 21, 2025

View reviewed changes

ultra

0bb8c75

Signed-off-by: Venky Ganesh <[email protected]>

venkywonka force-pushed the user/venky/ll-nemo-ultra-perf-tests-cpp branch from eb62a33 to 0bb8c75 Compare May 22, 2025 13:39

schetlur-nv merged commit c713eb5 into NVIDIA:main May 22, 2025
3 checks passed

venkywonka added a commit to venkywonka/TensorRT-LLM that referenced this pull request May 22, 2025

test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) (N…

094bacc

…VIDIA#4446) ultra Signed-off-by: Venky Ganesh <[email protected]>

venkywonka mentioned this pull request May 22, 2025

[cherry-pick] test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) (#4446) #4590

Merged

chzblych pushed a commit that referenced this pull request May 28, 2025

[cherry-pick] test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf t…

b4e598d

…ests (cpp) (#4446) (#4590) Signed-off-by: Venky Ganesh <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(perf): Add `Llama-3_1-Nemotron-Ultra-253B-v1` perf tests (cpp) #4446

test(perf): Add `Llama-3_1-Nemotron-Ultra-253B-v1` perf tests (cpp) #4446

Uh oh!

venkywonka commented May 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

venkywonka commented May 22, 2025

Uh oh!

tensorrt-cicd commented May 22, 2025

Uh oh!

tensorrt-cicd commented May 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) #4446

test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) #4446

Uh oh!

Conversation

venkywonka commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Invariants

Performance Summary

Latency percentiles

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

venkywonka commented May 22, 2025

Uh oh!

tensorrt-cicd commented May 22, 2025

Uh oh!

tensorrt-cicd commented May 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

test(perf): Add `Llama-3_1-Nemotron-Ultra-253B-v1` perf tests (cpp) #4446

test(perf): Add `Llama-3_1-Nemotron-Ultra-253B-v1` perf tests (cpp) #4446

venkywonka commented May 19, 2025 •

edited

Loading